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, ! Abstract 

This paper presents a novel power spectral density estimation technique for band-limited, 
, wide-sense stationary signals from sub-Nyquist sampled data. The technique employs multi- 

coset sampling and incorporates the advantages of compressed sensing (CS) when the power 
spectrum is sparse, but applies to sparse and nonsparse power spectra alike. The estimates 
are consistent piecewise constant approximations whose resolutions (width of the piecewise 
constant segments) are controlled by the periodicity of the multi-coset sampling. We show 
^ I that compressive estimates exhibit better tradeoffs among the estimator's resolution, system 

complexity, and average sampling rate compared to their noncompressive counterparts. For 
suitable sampling patterns, noncompressive estimates are obtained as least squares solutions. 
^ ■ Because of the non-negativity of power spectra, compressive estimates can be computed by 

. seeking non-negative least squares solutions (provided appropriate sampling patterns ex- 

■ ist) instead of using standard CS recovery algorithms. This flexibility suggests a reduction 

in computational overhead for systems estimating both sparse and nonsparse power spec- 
tra because one algorithm can be used to compute both compressive and noncompressive 
estimates. 



1 Introduction 

^ I Compressed sensing (CS) is a data acquisition strategy that exploits the sparsity or compress- 

■ ibility of a signal [THl] • Typically, a signal is defined to be sparse if its representation in an 



orthogonal basis contains only a few nonzero coefficients and is compressible if it can be weh 
approximated by a sparse signal. In its simplest setting, CS proposes to sample a sparse discrete 
signal X G M" by acquiring m < n inner product measurements yi = {ai , x.) , < I < m, where a/ 
are the rows of a m x n measurement matrix A. In matrix notation, y = Ax. Remarkably, CS 
asserts that x can be accurately (if not exactly) reconstructed from the measurements y even 
when the linear system of equations is underdetermined, provided A satisfies certain conditions. 
When CS is applied to the problem of sampling continuous-time signals, it has been shown that 
the undersampling of finite length vectors translates into sub-Nyquist sampling rates, where 
CS provides the theoretical justification and tools to reconstruct the signal despite the spectral 
aliasing that can occur [3HH]. 
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Figure 1: (a) An example of multi-coset sampling. Samples (dots) are acquired at nonuniformly 
spaced points over LT seconds with additional samples acquired at the same relative points in 
time over successive blocks of LT seconds, (b) Multi-coset sampler implemented as a multi- 
channel system. 

In this paper, we study the problem of estimating the power spectral density (PSD) of a 
wide-sense stationary (WSS) random process within the context of a sub-Nyquist nonuniform 
periodic sampling strategy known as multi-coset (MC) sampling pHl5j. Because of its nonuni- 
form periodic nature, MC sampling is closely related to CS0 and consequently, the proposed 
estimator easily incorporates and exploits CS theory. The estimator however applies broadly, 
and in particular, applies to cases where the PSD is not sparse. 

For a fixed time interval T and for a suitable positive integer L, MC samplers sample an 
input signal x{t) at the time instants t = {nL + Ci)T for 1 < i < q, n G Z, where the time 
offsets Ci are distinct, non-negative integers less than L (0 < q < L). The strategy is depicted 
in Fig. [TJ The sampling times are known collectively as the multi-coset sampling pattern and 
the sets {{nL + Ci)T : n G Z, Cj G {0, . . . , L — 1}} are cosets of {nT : n G Z}. Because T is 
fixed throughout the paper, we simply refer to {cj} as the multi-coset sampling pattern instead 
of the actual sampling times. MC samplers are parameterized by q, L, and {q} and exhibit an 
average sampling rate of q/LT Hz. They are most easily implemented as multichannel systems 
as shown in Fig. [T] where channel i shifts x{t) by CjT seconds and then samples uniformly at 
l/LT Hz. 

Here, we assume a multichannel MC sampler and propose a novel nonparametric PSD 
estimator based on the MC samples. The method estimates the average power within specific 
subbands of a WSS random process. Hence it produces piecewise constant estimates that are 
in contrast to those supported on a discrete frequency grid, e.g., the periodogram implemented 
using the discrete Fourier transform (DFT). Moreover, the estimator does not use passband 
filters to isolate subband signal components. Rather the estimator uses the spectral aliasing 
that occurs in each channel to underpin the formation of a linear system of equations whose 
solution is the PSD estimate of interest (see Section [2] for details). 

We categorize the solutions to this linear system as either compressive or noncompressive 
depending upon whether CS is employed in the recovery of the estimate. Compressive estimates 
result from the application of CS theory and pertain to situations where the underlying power 
spectrum is sparse and the linear system of equations is under deter mined. Noncompressive 

^In fact, it can be formulated as a CS sampling strategy. 
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estimates pertain to cases where the hnear system is overdetermined (or determined) with the 
power spectrum's sparsity being irrelevant. In both cases, the uniqueness of the recovered 
estimates depends on the choice of the samphng pattern {cj} (not all sampling patterns lead 
to unique solutions). Because of the special structure of the linear system, we show that in 
some cases compressive estimates can be computed using a non-negative least squares (NNLS) 
approach. This result is advantageous because in these cases it eliminates the necessity of 
using two separate algorithms to compute compressive and noncompressive estimates. (Note 
noncompressive estimates are normally computed as least squares solutions but can also be 
computed using NNLS; see Section [312] for details). 

The proposed method also indicates the regions within the (L, q) parameter space that allow 
for compressive and noncompressive estimates, and it illustrates the associated tradeoffs among 
quantities related to these parameters such as the estimator's resolution or the number of MC 
channels. By comparing these tradeoffs one can gauge the benefit of appling CS theory in cases 
where the PSD is sparse. Perhaps surprisingly, the advantage of a compressive versus a noncom- 
pressive estimate is not simply an improved sampling rate, but rather improved tradeoffs 
For example, for a fixed spectral resolution, a compressive estimate might achieve the same 
accuracy as a noncompressive estimate but with a smaller number of channels. In addition, we 
show that both compressive and noncompressive estimates are statistically consistent and show 
that the noncompressive estimate is asymptotically efficient. 

1.1 Related work 

There is a growing literature in applying CS and other sub-Nyquist techniques to power spec- 
trum estimation, most notably in the field of cognitive radios where there is a need to find 
underutilized bandwidth in a crowded spectrum |17H22j Most pertinent to the present discus- 
sion, however, is the work of Ariananda, Leus and Tian |18H20| . These papers also address 
the problem of PSD estimation from sub-Nyquist samples specifically incorporating CS-based 
sampling systems (including MC sampling). Notwithstanding similar goals and approaches, 
the estimator proposed here differs in two significant aspects. First, their method estimates 
the cross-correlations of the MC sample sequences and then recovers an estimate of the auto- 
correlation sequence of the input random process (from which they form a periodogram estimate 
of the PSD). In contrast, the method described here directly recovers a piecewise constant esti- 
mate from the MC cross-correlation estimates. The main difference being that Araiananda et 
al. estimate samples of the PSD whereas we estimate the average power within subbands. An 
implication of this difference is that the method of Ariananda et al. is more computationally 
expensive: their method requires the formation of a large x matrix in the construction 
of their q'^ x (2L — 1) measurement matrix; in contrast, we do not require the construction of 
this large matrix for our smaller q{q — l)/2 -|- 1 x L measurement matrix. Second, we treat the 
compressive case and address the statistical nature of our estimator (specifically taking into 
account the finite availability of data), while Ariananda et al. do not. 

1.2 Main contribution 

In short, the paper's main contribution is the development of a new nonpar ametric, statistically 
consistent, piecewise constant PSD estimator based on sub-Nyquist MC samples. When the 
underlying PSD is sparse, it seemlessly incorporates CS theory, letting the resulting compressive 
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estimator exhibit better parametric tradeoffs compared to their noncompressive counterparts. 
Moreover, when a suitable samphng pattern exists, a NNLS approach can be used to compute 
compressive estimates, suggesting a possible savings in computational overhead for systems 
estimating sparse and nonsparse power spectra. 

2 A PSD approximation from multi-coset samples 

In this section, we derive a piecewise constant PSD approximation assuming full knowledge of 
the MC sequences. This approximation serves as the motivation and model for the noncompres- 
sive and compressive estimators derived in Sections [3l In Section IH we present three examples 
illustrating the character and properties of the estimators. 

2.1 PSD approximation 

Let x{t) be a real, zero-mean, WSS random process with autocorrelation function 
fxxiT) — E x(ti)a;(t2), T = ti — ^2- The Fourier transform (FT) Pxx{^) of rxx{T) is the PSD of 
x{t) [23]. 

rxxir) < > Pxx{^) 

The PSD quantifies the power in any given spectral band of x{t) and is a second order description 
of the random process. Because rxx{T) is real and symmetric, P^xl^) is also real and symmetric. 
Throughout the paper, we assume x{t) is a band-limited process, meaning that Pxx{^) vanishes 
for \lo\ > TiW rad/s p. 376]. For the MC sampling, we set T = 1/W thereby referencing 
the average system sampling rate qW/L to the Nyquist rate W. 

To estimate Pxxi^^) using MC samples, we examine the cross-correlations of the output 
sequences yi{n) = x{n^ + ^),n G Z. Let a and b denote two channels indices, then 

ryayiin,m) = Eya{n)yb{m) 

= Ex{n^ + ^)x{m§ + ^) 

= rxx{win-m) + ^{ca-Cb)) (1) 

= '>'xx{^/k + -^ri^a — Cfc)) 

where the third step follows from the wide-sense stationarity of x{t) and the substitution k = 
n — m. From ([1]), we see that the cross-correlation function ry^y^{k) is equivalent to shifting 
the auto-correlation function VxxiT) by ^{ca — Ch) and then uniformly sampling it at W/L Hz. 
These operations imply the following discrete time Fourier transform (DTFT) pair for k £ Z 
and w G M, 

''^yayb(k) ~ '^xx{^k + jyica — Cf,)) 
X DTFT 

w L^^^-^^^J 1 w 
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Figure 2: Block diagram of the multi-coset sampling system that shows the temporal shifting 
before and after sampling. 



where the relation follows from the shifting and sampling properties of the DTFT [241 p. 98, 
117]. The summation limits are finite for a given uj because x{t) is assumed band-limited. (Here 
we also assume L is even.) 

The phase shift w(^--^i')'^ 

arises from the time shifts introduced in the individual 
channels. For our purposes, however, this phase shift needs to be removed. This can be 
accomplished by shifting the sequences yi{n) by an amount equal to, but opposite, the initial 
delays (see Fig. [2]) . Mathematically, this step can be expressed as 



Zi{n) = {yi-khi){n), nE 



1, 



(3) 



where * denotes convolution and hi{k) = sinc(7r(A; — Ci/W)),i = l,...q, are the impulse re- 
sponses of ideal fractional delay filters [25]. These filters are "fractional" in the sense that 
the time shifts are a fraction of the sampling period L/W. For simplicity, we assume ideal 
fractional delay filters throughout the paper. We note however that there is a large literature 
concerning the design and analysis of digital fractional delay filters (see [25] and the references 
wherein) ranging from highly accurate and computationally expensive filters to less accurate 
and computationally inexpensive ones. The choice of design is largely application dependent 
and a detailed analysis of the impact of imperfect fractional delay filters and is beyond the 
scope of this paper. 

By examining the cross-correlation functions r^^zbik), we see that the fractional delays 
indeed delay ry^yi^{k) by {ca — Ch)/W seconds. 

^zaZbik) =Kza{k + n)zb{n) 



haim)hb{l)ry^y,{k - m + I) 



m I 



a m 



m 



hi,{a — k) 



(4) 



(5) 



where the second step follows from ([3]), a = A; + /, and ha-b{k) denotes the impulse of an ideal 
fractional delay filter with delay {ca — Cb)/W seconds. 
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Equation dSJ also implies that the DTFT of Vz^ztik) is the DTFT of ry^y^{k) multiplied by 

g-j w^^'^~^''^^ . Thus, by multiplying the frequency domain component in ([2]) by e~-' w^^'^"^''^^ ^ 
we obtain the following inverse DTFT relation 



1 2^ 



m=-L/2+l (^g-) 

I-TVW/L L 

/ P^^(a;-27r-^m)e^'=W^ do;. 

J-nW/L 

Evaluating both sides of this equation at A; = produces a linear system of equations over the 
set of all pairs (a, b) of channels: 

L/2 
m=-L/2+l 



P^^(a; - 27r 



27r 



Pxxim) 



This linear system is the basis for our estimator. For a given m, Pxx{m) equals the average 
power of the process x{t) in the spectral subband [(2m — l)7rTy/L, (2m + 1)ttW/L]. Hence 
the set Pxxim)} forms a piecewise constant approximation to Pxx{(^)- Its resolution equals 
the width of the piecewise constant segments (W/L Hz) and is inversely proportional to L, 
the parameter determining the period of the nonuniform samples. Larger L implies finer res- 
olution; smaller L implies coarser resolution. The left hand side of ([7]) is a weighted sum of 
the total average power of x{t). In Section [3l we propose a statistical estimator based on this 
approximation that uses only a finite number of multi-coset samples. Like the approximation 
{^Pxx{m)}, the estimator is piecewise constant and thus does not estimate PSD amplitudes at 
specific frequencies like standard parametric techniques or periodiogram estimates implemented 
using the DFT. 

Letting i index the (2) + 1 combination^ of pairs {a,b), we can express ([7]) in matrix 
notation, 

u = *v (8) 

where u G M9(3-i)/2+i^ v G and 

U = [uq, . . . , lJ'q(^q~i)/2]'^ (T denotes transpose) 

= e~-^'"r (9) 

v = [vo,...,VL-l]'^ 

VI = Pxx{mi) 



In actuality, the total number of pairs is (j) + q, but each of the q cases where a = b contributes a row of 
ones to ^. Thus to avoid the row dependency, only a single row of ones is included in ^. 
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for i = 0, . . . , q{q — l)/2, I = 0, . . . , L — 1, and mi = —L/2 + 1 + 1. The number of equations in 
this hnear system can be doubled by separating ^ and u into their real and imagery parts: 



" Re(u) ■ 




" Re(*) " 


Im(u) 




Im(*) 


A ~ 

= u 







(10) 



where u and ^ have dimensions q{q — 1) + 1 x 1 and q{q — 1) + 1 x L, respectively. 

To calculate the PSD approximation, one would compute the set of cross-correlation values 
{'''zaZiiO) = ^ Za{n)zh{n)} for all pairs (a, 6) and solve for v. Here we briefly consider two 
types of solutions, deferring a more detailed discussion until after the estimator is presented 
in Section [3] (see in particular Sections 13.21 and l3.3p . First, we consider least squares solutions 
when ^ is full rank (rank L) and (|10p is overdetermined. Such solutions are referred to as 
noncompressive. Second, we consider CS solutions when pup is underdetermined and v is 
sparse in the sense that only a few of its entries are nonzero. These solutions are termed 
compressive. 

The existence of a unique least squares solution depends on the (column) rank of ^ which in 
turn depends on the sampling pattern {cj}. In particular, a sampling pattern must be chosen 
such that ^ has rank L (implying that q{q — 1) + 1 > L, or equivalently, that the number 
of rows of ^ is greater than or equal to the number of columns). In Section \3.2\ we discuss 
two methods for constructing sampling patterns that produce full rank but given such a 
sampling pattern, the unique least squares solution is 

vjsfc — argminll^'Q; — u||2 

^ " ^ (11) 

= (*^*)-i*^s 

where ||-||2 denotes the £2 norm, (^^^)~i^'^ equals the pseudoinverse of and the subscript 
NC indicates that the approximation is noncompressive. If q{q — l)/2 + 1 = L and ^ is 
nonsingular, we simply have vjvc = '^'"^u. 

Now, suppose Pxx{^) is spectrally sparse, i.e., suppose its support has Lebesgue measure 
that is small relative to the overall bandwidth. In addition, suppose q{q — 1) + 1 < L such 
that (fTOl) is an underdetermined linear system of equations. Then from a CS perspective, v 
is the sparse vector of interest that is to be recovered from the linear measurements u, and 

is a (subsampled Fourier) measurement matrix. The vector v is sparse because its entries 
represent the average power of x(t) in the subbands [(2m — \)nW/L,{2m + 1)ttW/L] and 
Pxx{^) is presumed to be sparse. In this situation, a host of CS recovery algorithms may be 
used to compute compressive approximations vc provided an appropriate sampling pattern 
and a sufficient number of measurements for a given level of sparsity (see e.g., |26H28j ). For 
example, for a suitable sampling pattern, the convex optimization problem 

min ||q:||i subject to u = ^a, (12) 

would yield a compressive approximation if 0(5" log^ L) measurements were acquired [29], where 
II -111 denotes the £1 norm and S denotes the number of nonzero entries in v. In Section [3.3^ 
we further exploit the structure of this linear system and advocate an algorithm that does not 
require an explicit sparsity constraint in the recovery procedure. 
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3 Proposed multi-coset PSD estimator 

The above PSD approximation is predicated on full knowledge of the output sequences yi{n), or 
equivalently, on having access to an infinite amount of data (an infinite number of multi-coset 
samples). In this section, we derive and characterize an estimator based on ([8]) that explicitly 
accounts for the finite availability of data. 



3.1 PSD estimation 

Again, let x{t) by a real, band- limited, zero- mean WSS random process with PSD P^xii^)- We 
observe x{t) over a finite duration interval and hence model the observed signal x^{t) as a 
windowed version of x(t): 

^ xit)wnit) (13) 

where 

WR{t) = {^ - - (14) 
I U otherwise. 

The MC samples of the windowed process x^{t) are 

yf(n)^x^(^(nL + c,)) ^^^^ 
= x{^{nL + a)) WR{^{nL + c^)), 

for n = 0, . . . , — 1 and i = 1, . . . ,q, where N = DW/L equals the number of samples acquired 
per channel during the D second observation interval. (Here N is assumed to be an integer.) 
The fractionally delayed samples are (c.f. ([3])) 

^fW = (yf (16) 

where again N signifies the number of sample^. 

Let a and b denote the indices of two channels. Then, given (n) and {n)^ the cross- 
correlation function rz^z^ik) can be estimated by the sample correlation 



Af-|fc|-l 

^i!..(^) = ]^ E z^{k + n)z^{n), (17) 

n=0 

for < < A — 1. Mimicking the standard derivation of a periodogram (see e.g. [301 p. 321]), 
we find the DTFT of ^, (A:) to be 



(18) 



where {e^^ w ) and Z^ {e-^^ w ) are the DTFT of z^ (n) and z^ (n) respectively. The transform 
Z^ {e-^^W^ may be further decomposed in terms of YJ^ {e-'^W^ and X^{uj) where 

x^(t) < > A^(a;) 

L (19) 



perfect fractional delay filter has an infinite impulse response hi{n) — sinc(7r(n — a/W)) and thus the 
sequence (n) has, strictly speaking, infinite duration. Ifowever, in p6p we only consider output samples 
(n), n — 0,1, . . . , N — 1. We do not include another windowing operation because (n) will be finite for any 
practical, realizable filter. 
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Specifically, for cj G M, we have from p5|) and (|T6|) that 



W ^ n w -2- (20) 



m=—oo 



Because x^{t) is time limited, it cannot, strictly speaking, be band-limited. However, for 
sufficiently long windows it is reasonable to assume that the vast majority of the energy of 
x^(t) lies within the band (— vrVF, vrVF], i.e., it is reasonable to assume that with sufficiently 
long windows x^{t) is essentially band-limited to the same band as x{t). With this assumption, 

one period of [e-^'^Wj is well approximated by the finite summation, 

L 

Z^(e^^W)=^ Y] - 27rf m)e-^T'=-"^. (21) 

m=-f+l 

Substituting (|2ip for Z^ie^^W) in (jl8p (along with a corresponding expression for 
Z^ {e?^'wy) and evaluating the inverse DTFT at zero yields 

■kWIl ^ T (22) 



L(o)=EE^-^'^^^" 

m n 

r"^ ^^^(c^)[^f (c^)]* 



-■kW/L 

where X!^{ijj) is shorthand notation for X^{ijJ — 27r-^m). 

We observe that for m = n, the integrands, ^\X^{uj)\^ ,m = —L/2 + l,...,L/2, are 
periodogram estimates of L disjoint subbands of Pxx{oj), and the corresponding integrals 

^i^^(-)r p (23) 

are estimates of the power within these subbands. The set {^Pxx{fn)} thus constitutes a 
piecewise constant estimate of Pxx{^) at resolution W/L. 

In contrast, the cross terms in (I22p (i.e., terms for which m ^ n) asymptotically approach 
zero as or equivalently A^, grows large. To see this, rewrite the integrand in (I22p as 

ix^(^)[xf(^)]* 



1 

'd' 

""^ da 



/oo 
WR{a)X{uj - 27r-f m - a) 
-oo 



(24) 



H^^(/3)X*(a;-27rfn-/?) 
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where Wji{uj) = D sinc(ya;)e 2 is the FT of the rectangular window wii{t), X{uj) is the FT 



of x{t) appropriately defined (see [23^, p. 416]), and 

sinc(a) : 



1, if a = 

sin (a) /a, otherwise 



As — )• 00, Wr{uj) approaches a delta function [31, p. 280]; however because of the factors 
1 /^/D, the integrals approach zero as D — t- 00 for m ^ n. 

Because of this fact, we choose to approximate (0) using only the terms m = n in 



ZaZb 

.271 



L(0)-E^"'"^""'^'^'"^-("^)- (25) 



Letting i index all pairs {a,b), we arrive at an empirical counterpart to ([H]) 

G = *v (26) 

where 



u 



[no, . . . ,S^(q-l)/2]' 



ip.^ = (,-j^i''^'~''b)^m (27) 

V = [vo, . . .,VL-lf 

VI ~ Pxxim) 

for i = 0, . . . , q{q — l)/2, / = 0, . . . , L — 1, and mi = —L/2 + 1 + 1. Similar to (flOl) . we can 
double the number of equations by separating the real and imagery parts of ^ and u: 

G = *9 (28) 

where u = [Re(u) Im(u)]'^ and ^ G 1^9(9-1)+! defined in (llOp . In this equation, v is the 
PSD estimator we wish to solve for. 



3.2 Noncompressive solutions (overdetermined case) 

A unique least squares solution to (I28p exists if ^ has full column rank, i.e., rank(^) = L. Full 
rank implies q{q — 1) + 1 > L, or that the system is overdetermined (here we treat a determined 
system as a special case of the overdetermined one). In this case, a noncompressive estimate 
can computed using the pseudoinverse 

V7VC = (*^*)~^*^G, (29) 

where (^"^^)~^^"^ is the pseudoinverse of A solution of this type is noncompressive because 
it does not rely on CS theory, and in particular, it does not require v to be sparse. 

For a fixed L, a full rank ^ requires (i) choosing q large enough such that q{q — 1) + 1 > L 
and (ii) choosing a sampling pattern such that rank(^) = L. The first condition is required 
since q{q — 1) + 1 < L implies rank(^) < q{q — 1) + 1, regardless of the sampling pattern. 
The second condition is necessary since there exists sampling patterns that lead to ^ having 
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250 




250 



Figure 3: Four scenarios showing that random sampHng patterns can produce well-conditioned, 
full rank ^ after some threshold point in the noncompressive case. Left: Fraction of times 
random patterns produce full rank ^ for different (L, q) values. Right: Corresponding condition 
numbers. 



a non-empty nullspace even if q(q — 1) -|- 1 > L, meaning that the least squares solution is no 
longer unique. Section [3^ discusses the tradeoff between q and L imposed by the first condition 
assuming the second condition is met. Here we address the second condition. 

Fig. [3] offers empirical evidence suggesting that generating a sampling pattern uniformly at 
random from {0, . . . , L — 1} can be a simple and effective way of constructing a full rank 
provided the ratio of its dimensions is small enough. The plots show the results from 

four trials corresponding to four different values of L. For each trial, L was fixed and q was 
allowed to range from its minimum value — the smallest integer value satisying q{q — l) + l > L — 
to its maximum value L — 1. For each {L,q) pair, 500 sampling patterns (of length q) were 
generated and the fraction of those that produced full rank ^ are recorded in the left hand 
plot. The right hand plot displays the average condition number of ^ in each case. Both plots 
suggest that after some threshold point random sampling patterns consistently produce well- 
conditioned and full rank matrices for the noncompressive problem. For the four cases shown, 
the threshold occurs when the ratio roughly equals 0.12, but a formal treatment of 

this phenomenon is beyond the scope of this paper. 

For larger ratios (still falling within the noncompressive problem), full rank 4f can 

sometimes be constructed by choosing the sampling pattern to be a Golomb ruler [32]. Golomb 
rulers are integer sets whose difference sets contain unique elements. The cardinality of the 
integer set is the ruler's order and the largest difference between two integers is its length. 
The idea is that the integers represent marks on an imaginary ruler and the elements of the 
difference set are all the lengths that can be measured by the ruler. The goal in the study of 
Golomb rulers is to find the minimum length rulers for a given order. In the present context, 
however, we are only interested in using already tabulated rulers. For example, for L = 64 and 
q = 10, one can take the minimum-length order-10 ruler to be the sampling pattern 

{cai£i = {1, 2, 7, 11, 24, 27, 35, 42, 54, 56}. 

Note that the ruler's length, 55, is less than L = 64 as it should be for MC sampling. This 
pattern yields a full rank ^ with condition number 1.4 whereas in Fig. [3] random patterns 
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never produced a full rank matrix out of 500 test cases. The implication of this improvement 
is somewhat limited however because there are relatively few known (up to order 26) minimum 
length Golomb rulers. For instance with L = 1024, the minimum q value is 33, but there 
are no known minimum length Golomb rulers with orders greater than 26. When applicable, 
Golomb ruler sampling patterns can be an effective means to obtain a full rank ^ when the 
ratio —/ — 4ttt is near the threshold value. Note also that Ariananda et al. advocate the use 



of Golomb rulers in their study of multi-coset PSD estimation, but they used it to ensure the 
rank of a binary matrix |20j . 

To summarize the foregoing, one computes a noncompressive estimate by fractionally de- 
laying each channel of MC samples yf{n),i = 1, . . . ,g, computing the cross-correlation values 
{^W6(0)}> and solving (l26]) via (f29D . 

3.3 Compressive solutions (underdetermined and sparse case) 

When (j28p is underdetermined and the sampling pattern is chosen uniformly at random from 
{0, . . . ,L — 1}, ^ is a randomly subsampled DFT matrix which is a well-known type of CS 
measurement matrix. Thus a number of CS recovery algorithms (e.g., [26'-'28]) will solve (I28p . 
provided v is sparse and the number of CS measurements (length of u) is sufficient. However, 
because (j28p has special structure, the usual CS recovery algorithms may be replaced by a 
non-negative least squares approach. This claim rests on the following result of Donoho and 
Tanner [33] (see also [53]): 

Corollary 1 ( [35], Corollary 1.4 and 1.5). Assume d is even and for n = 0, 1, 2, . . . , L — 1, let 

A G M''^'^ be the partial DFT matrix (d < L) 



Let X be a non-negative s-sparse vector, i.e., let x have at most s nonzero positive entries. 
Then given the linear measurements b = Ax, any approach that solves the linear system and 
maintains nonnegativity will correctly recover the s-sparse solution x, provided d > 2s. 

The corollary says that when x is s-sparse and non-negative, one needs only 2s Fourier 
measurements to uniquely recover it. Moreover, the theorem states that x can be recovered 
using any algorithm that produces a nonnegative solution to Ax = b. The fact that a L 
dimensional s-sparse vector x can be recovered from only d < L Fourier measurments is an 
archetypical CS result. However, in contrast to typical CS recovery strategies, signal recovery 
in this case does not require explicit use of a sparsity constraint like ii minimization (see for 
example (I12p ). The reason is that this result is a specific case of a more general phenomenon in 
the recovery of sparse nonnegative vectors for underdetermined linear system of equations. In 
essence, it is known that if the row span of a measurement matrix intersects the positive orthant 
and if the vector to be recovered is nonnegative then the solution to the underdetermined linear 
system is a singleton, provided the vector of interest is sparse enough [35l[36] (see also j37j). 
This finding is key since it implies that compressive PSD estimates can be obtained in a manner 
similar to noncompressive estimates, i.e., they can be obtained using NNLS (more about this 
point in a moment). 





k = 1,3,5, . . . ,d — 1 
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The application of Corollary [T] to (j28p is straightforward. The required matrix A is a 
permuted version of ^ provided the sampling pattern is chosen such that the set of differences 
(ca — Cfe) form the consecutive sequence 0, 1, 2, . . . , s — 1. In particular, A can be constructed 
from ^ as 



The value for q is chosen such that q{q—l) + 1 > 2s, i.e., such that the number of measurements 
(length of u) is greater than twice the sparsity. 

One strategy to construct a sampling pattern that forms the difference set {0, 1, 2, . . . , s — 1} 
is to again use Golomb rulers. Most Golomb rulers do not produce a difference set that is a 
consequence sequence up to its length, but because differences do not repeat for Golomb rulers, 
they tend to furnish nearly consecutive sequences. For example, the order-7 minimum length 
ruler {1, 3, 4, 11, 17, 22, 26} produces the difference set: 



which nearly forms the consecutive sequence 0, . . . , 25, missing only the elements 12, 17, and 20. 
In Section 14.21 we present an example that uses this sampling pattern to successfully recover 
a 16-sparse vector vq- If an appropriate Golomb ruler (sampling pattern) does not exist for a 
given L and s, one can use other CS recovery approaches (e.g., ii minimization, CoSaMP |27] . 
or iterative hard thesholding [28j) with a randomly generated sampling pattern. 

The non-negativity requirement of Corollary [T] for v is met asymptotically as — t- oo. This 
claim can be inferred from Section [3] where it is shown that the entries of v approach integrated 
periodogram estimates, which are non- negative by construction (see (I22p through (|25p ). For 
PSD estimation, the non-negativity requirement is natural since by definition power spectra 
are non- negative. 

When Corollary [T] can be applied, we compute compressive estimates vc using NNLS: 



which is an optimization problem that can be solved via a linear program. Choosing a NNLS 
approach for compressive estimates is advantageous because noncompressive estimates can also 
be computed using a NNLS algorithm (and is perhaps preferred since vj^c can be negative; see 



3.4 Tradeoffs in noncompressive case 

Given a sampling pattern that produces a full column rank the relation q{q — 1) + 1 > L 
establishes a tradeoff between q and L (Fig. d]) and tradeoffs among other related quantities 
(Fig. [5]). For example, it establishes a tradeoff between system complexity and the estimate's 
resolution, where here system complexity is simply taken to be the number of channels q. As 
the left hand plot in Fig. [5] shows, finer resolution (smaller W/L) comes at the price of higher 
system complexity. As a concrete example, consider a WSS signal x{t) band-limited to 1 GHz 
and a desired resolution of 5 MHz, implying L = 400. A noncompressive solution would then 
require at least 21 channels; however, if a resolution of 15 MHz suffices, the number of channels 
could be reduced to 13 (see Fig. 




(30) 



{0,1,2,3,4,5,6,7,8,9, 10,11, 

13, 14, 15, 16, 18, 19, 21, 22, 23, 25} 



(31) 



Vc = argmin||^Q! — u||2 subject to o; > 0, 



(32) 



Fig. El). 
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Figure 4: Tradeoffs between parameters q and L. The lightly shaded areas in both plots de- 
marcate regions in the parameter space where noncompressive estimates exist given that ^ 
is full rank. The bounding curve is q{q — 1) + 1 = L. The darker shaded areas are regions 
where compressive estimates exist and represent the possible gains in terms of improved param- 
eter tradeoffs over the noncompressive case. The compressive regions derive from the specific 
scenario where v is 16-sparse at resolution L = 128. 

If one has the the freedom to choose q and L, the ratio q/L can be driven to zero even while 
maintaining the inequality q{q — 1) -|- 1 > L. This means that noncompressive estimates can 
theoretically be computed at arbitrarily low sampling rates and at finer and finer resolutions 
(see right hand plot of Fig. [S]). Estimation at arbitrarily low sampling rates is a property shared 
with exisiting alias-free PSD estimators |38H42j that are based on random sampling where the 
set of sampling instances are (typically) realizations of a stochastic point process (e.g., a Poisson 
point process). Arbitrarily low sampling rates and arbitrarily fine resolutions, however, require 
arbitrarily high numbers of channels and arbitrarily long signal acquisition times. Thus in 
practical noncompresive situations, an appropriate balance needs to be struck between q and 
L that satisfies the requirements of a given application. Section 14.11 provides evidence of this 
tradeoff in a specific example. 

3.5 Improved tradeoffs in compressive case 

With sufficient sparsity, ()28p can be uniquely solved even if it is underdetermined, i.e., even 
if q(q — 1) -|- 1 < L [UElHj. This means that compressive estimates vc can be computed with 
better q — L tradeoffs compared to noncompressive ones, i.e., for a given L, a compressive 
estimate can be recovered with a smaller q than what is possible for a noncompressive estimate, 
or conversely, for a given q, a compressive estimate can be recovered with a larger L. 

The degree of the gain depends on the level of sparsity of v at a specific resolution L. Let 
s denote the sparsity (number of nonzero elements) of v at resolution L. Then according to 
Corollary [H a compressive estimate can be computed with 2s or more Fourier measurements. 
Because the number of measurements in ()28p equals q{q — 1) + 1 (length of u), the need for 
2s measurements sets a minimum q. Since compressive estimates pertain to underdetermined 
linear systems, the condition q{q — l) + 1 < L sets a maximum q. To apply Corollary [U however, 
a sampling pattern of length q must exist that produces the set of differences 0, ...,s — 1. If 
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Figure 5: Tradeoffs among resolution, system complexity, and system sampling rate. Left: 
The progression from point (1) to point (2) shows that for a noncompressive estimator the 
reduction in the number of channels comes at a price of coarser resolution. However, for a 
16-sparse vector v at resolution 15 MHz, a compressive estimator can reduce the number of 
channels to 7 — progression from (2) to (3). Right: Corresponding tradeoff between resolution 
and system sampling rate. 

sampling pattern does not exist for the minimum q value, larger q values must be considered. 
Thus, for a compressive estimate, the value of q depends on the sparsity of v (for a given L) 
and on the availability ability of an appropriate sampling pattern. We discuss specific examples 
in the next paragraph. Note that if an alternative CS recovery algorithm is used to solve (|28|) 
instead of applying Corollary [H larger q values will be needed for a given s (and L) because 
these algorithms required larger numbers of measurements. 

We illustrate the improved tradeoffs in two scenarios each of which are based on the pre- 
sumption that V is a 16-sparse vector at resolution L = 128 (v G R^^^). The results are 
graphically shown in Fig. [H As described above, the light shaded regions represent the (L, q) 
pairs for which noncompressive estimates can be computecfl, and are bordered by the curve 
q{q — 1) + 1 = L. The darked shaded regions in each panel represent the (L, q) pairs for which 
compressive estimates can be computed. The difference between the compressive regions on the 
left and right plots is that the region on the left presumes s changes with L and the one on the 
right presumes s is independent of L, i.e., it remains constant as L varies. In the left hand plot, 
we assume s doubles every time L doubles. Thus, in comparison to the initial presumption 
s = 16 at -L = 128, we assume s = 32 at L = 256 which implies a mimimum q = 9. This 
behavior models the situation where the active subbands of x{t) at resolution L = 128 contain 
energy at every frequency within the subband, so as the subbands are subdivided, s increases. 
The scenario on the right reflects the situation where the active subbands only contain energy 
at one specific frequency, so s remains constant as L increases. 

To judge the implications of these tradeoffs, we examine the associated tradeoffs among 
system complexity (q), resolution (W/L), and average system sampling rate (qW/L) in Fig. O 
As above, the compressive regions are drawn under the presumption that s = 16 for L = 
128. The plots, however, are only for the case where s is independent of L. Intuitively, this 

^Again, the ability to compute a noncompressive estimate for a valid (i, q) pair also depends on whether a 
suitable sampling pattern can be found. 
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represents a best case scenario in terms of "CS gain" over the noncompressive case (best possible 
improvements in q for a fixed L and in L for a fixed q). The left hand panel shows system 
complexity as a function of resolution W/L where W = 2 GHz. The right hand panel plots 
resolution versus system sampling rate. The plots show that for this particular scenario, a CS 
approach can have appreciable benefits when the underlying PSD is sparse. For example, in 
Section [3.41 we stated that the recovery of v with a resolution of 15 MHz required a 13 channel 
MC sampler for a noncompressive estimator. However, for a 16-sparse vector v, a compressive 
estimate with the same resolution can be recovered with only q = 7 channels, a reduction of 
almost 1/2. This gain is depicted in Fig. [5] as the shift from point (2) to point (3). The same 
reduction factor can also be seen in average system sampling rate (right hand plot of Fig. [5]). 

These improved tradeoffs may or may not effect the error in the recovered estimate. If 
L is held fixed and v is sparse, then as described above, q can be reduced to the smallest 
integer such that q{q — 1) + 1 > 2s (according to Corollary [T]) . This reduction in q reduces 
the overall system sampling rate; however, the error in the recovered compressive estimate is 
not significantly different from the error in a corresponding noncompressive estimate because 
CS theory guarantees accurate (if not exact) recovery of v. Now, if q is held fixed and L is 
increased, the sampling rate of each channel decreases. Thus, if the observation interval is 
fixed, the correlation estimates Ui become worse (in the sense that their error is larger) because 
less and less samples are used to compute Ui. In this case, CS still allows the recovery of an 
estimate, but its quality is declining as L increases. To maintain accuracy, one must increase 
the observation interval and collect more samples per channel. 

3.6 Consistency of the noncompressive estimator 

To simplify notation, we examine the bias and variance of the least squares solution to u = 
instead of the solution of the expanded version u = ^'v. The analysis is equivalent in either 
case. Assuming ^ is full rank, the noncompressive solution is 

= (*-^*)"^*^u 

= 

where H denotes Hermitian transpose and is shorthand notation for the pseudoinverse. 

Bias. The bias of v^vc is the amount its expected value differs from v^vc = 'J'^u. Taking 
the expectation, we have Ev^rc = ^'^Eu. One element of Eu equals 

ES,= Efi^,^(0) 

^ oo oo 

m=—oo l=—oo 
N-1 

E [yain - m)yb{n - l)]wR{n - ■m)wB{n - I) 

n=0 

= ^ ^ haim)hbil)ry^y, (l - m) 

m I 

n=0 
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where, as in (p7|) . i is an index for the pair (a, b), the second equahty follows from ([16]) and (fT7|) . 
and WR^n) is a discrete rectangular window 



WR{n) 



1 0<n<iV-l 
otherwise. 



It is straightforward to show that 

N-l 

N 



, , , (36) 
_ri_^ |^_/|<Ar_i 

\o \m - l\ > N - I. 

By substituting this expression back into (|35p , it is clear that v^vc is a biased estimate for finite 
A^; however asympotically, ^^nc is unbiased since 



Eui ^='^^^ha{m)hh{l)ry^y,{l-m) (37) 



m I 

r,„,,(0) (from Q) (38) 
Ui. (39) 



implies E Vjvc = 'J'^^Eu — )• = v as — )• oo. 
Variance. The covariance matrix of Vjvc is 

E {VNC - IE [VATC]) (VATC - E [VATC])^ 

= *tE(Q_EQ)(Q-EQ)-f^[*^]-^ (40) 

^ V ' 

where K is the covariance matrix of u. The diagonal elements of (140 p are the variances of the 
elements of v^c- 

M^i) = Y.T.^h^^M^' ^ = o,...,L-i (41) 

where Kjj = covar (uj, n-,) . Note that in (j4ip we have dropped the subscript NC in our notation 
for the elements of vjvc- To show that vav{yi) — )■ as — )• c«, we proceed to upper bound (flTl) 
with a sequence in that approaches zero as A^ — )• oo. 

First, we note that the inequality [covar (uj, -Uj)]^ < var ({ij)var (n-,) |43] p. 336] implies 



covar m 



i,Uj) \ < Y^var({ij)var(nj). (42) 
Second, for the correlation estimate Ui = r^^^{0) it is well-known [30, p. 320] thalU 

var(fi:.^(0)) . A {l-^^yi^^n.). (43) 

m=-(N-l) 



^The reason for the approximation in (|43p is that this expression is derived assuming that Za{n) and zi,{n) 



are jointly Gaussian; however, the result applies to other distributions as well [30j. 
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Hence, if r'^^^^{m) is square-summable, var(fj^2^(0)) = vai{ui) approaches zero as — )• oo. 
Bound dll]) by 

var(?^/)<EEl*L-l iKijI (44) 

Then from ([^2]) and (H3]) . we conclude that var(v;), and thus var(vjvc'), approach zero as 
oo. 

The noncompressive estimator vnc is thus statistically consistent since it is asymptotically 
unbiased and since its variance decreases to zero as N ^ oo. Incidentally, if x{t) is a zero-mean 
Gaussian WSS random process, vjyc is also asymptotically efficient [lH p. 471], meaning that 
the variance of the limit distribution of v^rc achieves the Cramer- Rao bound. As we show 
in the Appendix, asymptotic efficiency follows from the fact that u is a maximum likelihood 
estimate of u. 



3.7 Consistency of the compressive estimator 

A detailed analysis of the compressive estimator's consistency is beyond the scope of this paper 
because it depends on the specific CS recovery algorithm is employed. Nevertheless, compressive 
estimates are consistent in the sense that ideally sparse vectors v can be exactly recovered by 
some CS recovery algorithms, and thus the consistency of u (as shown above) implies the 
consistency of vc. 



4 Examples 



4.1 Estimating MA spectra with Hnes 

Consider sampling a (nonsparse) moving average power spectrum with spectral lines using a 50 
channel MC sampler (q = 50, L = 64). The process is generated by passing white noise (with 
unit variance) through a filter with transfer function H(z) = (1 — z~^){l + z~^)^ and then 
adding the two sinusoidal components, 2 cos (^ff-) and 2 cos (^^g^)- Fig. 6(a) shows the true 



PSD. Fig. |6(b)] and Fig. 6(c) show two resulting noncompressive estimates for different values of 
A^, the number of MC samples per channel. For comparison, a coarse resolution approximation 
of the true PSD is overlaid on the estimates. Because vnc is consistent, its mean squared 
error with respect to this coarse resolution approximation decreases to zero as — )• oo. This 
is depicted in Fig. 6(c) Fig. [7] further illustrates the consistency of vnc within the context of 



this example for various (L, q) values. In each case, the average squared error monotonically 
decreases as A^ increases. 



4.2 Sparse multiband power spectra 

Suppose a real WSS random process x{t) has a sparse multiband power spectrum band-limited 
to 1 GHz {W = 2 GHz). Suppose also that we are interested in a PSD estimate having a 
resolution of about 15 MHz. To satisfy this requirement we choose L = 128 which gives a 
resolution of 15.625 MHz. Suppose also that the sparse PSD is composed of two active bands, 
each with an approximate bandwidth of 30 MHz. Being conservative, we thus expect v to be a 
16-sparse vector (including positive and negative frequencies with each active band taking up 
4 adjacent subbands). 
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(b) q = 50,L = 64,iV = 50 
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Figure 6: Estimating MA spectra with lines, (a) True power spectrum, (b) Overlay plot of a 
noncompressive estimate and a coarse resolution approximation of the true power spectrum. 
The height of the bars represent the average power in the band which it spans, (c) Estimate 
of same spectrum except the number of samples per channel, N, increased from 50 to 10000. 
Note the convergence of the noncompressive estimate. 
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Figure 7: Decrease in average squared error for various noncompressive scenarios as — )• oo. 
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Figure 8: Estimating sparse multiband power spectra. The overlay plots compare compressive 
and noncompressive estimates to a coarse resolution Welch estimate (black) based on Nyquist 
rate samples, (a) Noncompressive estimate (magenta) with random sampling pattern, (b)-(c) 
Compressive estimates (blue) with Golomb ruler sampling pattern, (d) Minimum norm least 
squares estimate (red) in underdetermined case. 
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The PSD can be estimated in either a noncompressive or a compressive manner. Choosing 
q = 20, leads to an overdetermined system in (|21|) {q{q — 1) + 1 = 381 > 128 = L) and 



a noncompressive estimate. Fig. 8(a) shows the resulting noncompressive estimate with a 
normalized square error of 0.010. The error is calculated with respect to a coarse Welch estimate, 
i.e., it has the same resolution as the noncompressive estimate. The Welch estimate is based 
on uniform Nyquist rate samples and is computed with no overlap. The sampling pattern was 
chosen uniformly at random from the set {0, . . . , 127} and it was verified that ^ had full rank. 
The fractional delays were implemented by interpolating the MC sequences to the Nyquist rate, 
shifting them by the appropriate delays, and then subsampling them to the original rate. Each 
channel collected 1000 MC samples and the average system sampling rate was about 1/6 of the 
Nyquist rate. 

According to Corollary[Tl the number of measurements required to recover a 16-sparse vector 
needs to be greater than or equal to 32. This implies that q needs to be at least 7 because the 
number of measurements in (I28p equals q{q — l) + l and for q < 7 this quantity is less than 32. In 
addition. Corollary [1] requires that the sampling pattern produces the set of differences 0, . . . , 15 
for a 16-sparse signal. Fortunately, a Golomb ruler of order 7 ({cj} = {1,3,4,11,17,22,26}) 
exists that produces the necessary difference set. Fig. |8(b)| shows the resultant compressive 
estimate (normalized squared error 0.023). The reduction in the number of channels from 13 
to 7, in going from the noncompressive to the compressive estimator, reduces the complexity 
of the sampler, or equivalently, halves the average system sampling rate. 

Fig. 8(c) displays the same compressive estimate as in Fig. |8(b)] except that it is computed 
with 10000 samples per channel instead of 1000. The normalized squared error is 0.013. In 
contrast to this figure. Fig. |8(d)] shows the minimum {£2) norm least squares solution computed 
by the pseudoinverse. As one would expect, this solution is far from the true PSD and does 
not improve with more samples. 



4.3 Spectral sensing for cognitive radio 

To opportunistically use their resources, cognitive radios must be able to dynamically sense 
underutilized portions of the radio spectrum. Towards this end, methods and algorithms have 
recently been proposed to sense the largest possible bandwidth while sampling at the lowest 
possible rate |19j. Some of these methods have taken advantage of CS, but the following example 
shows that a noncompressive estimator can monitor large bandwidths at low sampling rates 
(although, as explained above, CS will allow better tradeoffs with a compressive estimator). 

This example is only a caricature of the actual problem that must be solved to realize a 
practical cognitive radio system. Let x{t) be a MA random process generated from filtering 
white Gaussian noise with a notch filter. The filter notches the spectrum such that it has 
approximately two 80 MHz stop bands within an overall band of 1 GHz {W = 2 GHz). Fig. 
shows two noncompressive estimates and compares them to a Welch estimate of the same 
resolution and calculated using Nyquist samples. The left plot sets g = 25 and L = 64 and the 
right plot sets g = 50 and L = 128, meaning that the resolution of left hand estimate is half the 
resolution of the right hand estimate (half as fine) and that the sampling rate per channel on 
the left is twice that of the right (31.25 MHz vs 15.625 MHz) In each case, however, the overall 
average system sampling rate is the same (781.25 MHz). The noncompressive estimate on the 
left is formed with 4096 MC samples per channel, the one on the right with 2096 samples. 
Both plots suggest that spectral holes could be detected simply by thresholding the estimates 
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Figure 9: Estimating spectra with holes. The plots show two different noncompressive estimates 
(differring resolutions) of a MA spectrum with two spectral notches (holes). 



and hence noncompressive estimators may provide a way to monitor large bandwidths at low 
sampling rates. 

5 Conclusion 

In this paper, we derived and analyzed a consistent, MC based PSD estimator that produces 
both compressive and noncompressive estimates at sub-Nyquist sampling rates. The estimator 
does not estimate the power spectrum on a discrete grid of frequencies but instead computes 
average power within a given set of subbands. Compressive estimates leverage the sparsity of the 
spectrum and exhibit better tradeoffs among the estimator's resolution, complexity, and average 
sampling rate compared to noncompressive estimates. Given suitable sampling patterns, both 
compressive and noncompressive estimates can be recovered using NNLS, thus avoiding the 
overhead of computing the estimates in different ways. The estimator is consistent and has 
wide applicability; it is especially attractive in wideband applications where high sampling 
rates are costly or difficult to implement. 

6 Appendix 

To show that v^rc is asymptotically efficient, we show that it is a maximum likelihood estimator 
(MLE) because MLEs are known to be asymptotically efficient |43t p. 472]. 

Define z(n) to be the nth snapshot of samples across all the channels of a MC sampler: 

z(n) = [zi{n), Z2{n),..., Zq{n)f . (45) 

The entries of z(n) are the nonuniform samples collected within one period {L/W seconds) of 
the MC sampling sequence. The upper triangular portion of the sample correlation matrix S 
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are the elements of u: 
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(46) 



(47) 



Suppose x{t) is zero-mean WSS Gaussian random process. Then each snapshot z(n) is an 
i.i.d. reahzation of a jointly Gaussian random vector with correlation matrix R. Taking R to 



be the parameter of interest, the log-likelihood function given the data {z,{n)}^^Q is 



L(R;{z(n)}) 



-qN 



N 



N 



In 27r In det (R) tr [R'^S 



(48) 



2 2 ^ ' 2 

The maximum likelihood estimate of R can be found by taking the gradient of i^(R; {z(n)}) 
with respect to R and setting the result equal to zero. 



aL(R;{z(n)}) 



N 



-1\T 
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+ — (R^SR 
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(49) 



9R 2 

Solving for R reveals that the maximum likelihood estimate of R equals the sample correlation 
S, or in other words, the entries in u are maximum likelihood estimates of ?'zf,zi,(0) for all 
combinations of pairs (a, 6). 

Now since wnc = ^^u, we have by the invariance property of maximum likelihood estima- 
tors |43y45| that ^nc is a maximum likelihood estimate of v. We therefore conclude that v^vc 
is asymptotically efficient when x{t) is a zero-mean WSS Gaussian random process. 
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